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Abstract 

Background: There exist > 78,000 proteins and/or nucleic acids structures that were determined experimentally. 
Only a small portion of these structures corresponds to those of protein complexes. While homology modeling is 
able to exploit knowledge-based potentials of side-chain rotomers and backbone motifs to infer structures for new 
proteins, no such general method exists to extend our understanding of protein interaction motifs to novel protein 
complexes. 

Results: We use a Motif Binding Geometries (MBG) approach, to infer the structure of a protein complex from the 
database of complexes of homologous proteins taken from other contexts (such as the helix-turn-helix motif 
binding double stranded DNA), and demonstrate its utility on one of the more important regulatory complexes in 
biology, that of the RNA polymerase initiating transcription under conditions of phosphate starvation. The modeled 
PhoB/RNAP/a-factor/DNA complex is stereo-chemically reasonable, has sufficient interfacial Solvent Excluded 
Surface Areas (SESAs) to provide adequate binding strength, is physically meaningful for transcription regulation, 
and is consistent with a variety of known experimental constraints. 

Conclusions: Based on a straightforward and easy to comprehend concept, "proteins and protein domains that 
fold similarly could interact similarly", a structural model of the PhoB dimer in the transcription initiation complex 
has been developed. This approach could be extended to enable structural modeling and prediction of other bio- 
molecular complexes. Just as models of individual proteins provide insight into molecular recognition, catalytic 
mechanism, and substrate specificity, models of protein complexes will provide understanding into the 
combinatorial rules of cellular regulation and signaling. 



Background 

Solving structures of complexes is inherently more diffi- 
cult than solving those for individual proteins. As a 
result, significantly fewer structures of protein com- 
plexes than individual proteins have been determined 
experimentally [1]. In recent years, homology modeling 
[2,3] proved to be successful when the target protein 
has a similar sequence to proteins with known struc- 
tures. However, the lack of a sufficiently large database 
of reference complexes makes the method unsuitable for 
structural modeling of protein complexes. A concep- 
tually simple and straightforwardly applicable approach 
for modeling structures of bio-molecular complexes is 
highly desirable. When proposing new protein com- 
plexes, the models developed should be checked against 
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the following attributes: stereo-chemically sound, having 
sufficient interfacial Solvent Excluded Surface Areas [4] 
(SESAs) to provide adequate binding strengths, physi- 
cally meaningful for transcription regulation and consis- 
tency with the known experimental data. 

PhoB is a response regulator of the two-component 
signaling system that is activated under phosphate star- 
vation conditions [5]. It activates more than 30 genes of 
the pho regulon [6]. Structurally similar to many other 
response regulators, PhoB has two domains: an N-term- 
inal Receiver Domain (RD) and a C-terminal Effector 
Domain (ED). The ED of PhoB adopts a winged-helix 
structure that consists of three a -helices flanked by two 
sets of p -sheets [7]. The PhoB RD adopts a P-a struc- 
ture [8] that can be classified as a flavodoxin-like fold 
according to SCOP [9]. The flavodoxin-like fold can be 
found in RDs of other response regulators as well as fla- 
vodoxins [10], cytochrome-P450 oxidoreductase [11] 
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and Toll/Interleukin Receptor TIR domains [12]. These 
protein domains share the same structural fold with lit- 
tle or no sequence homology. 

While PhoB has long been known to regulate the 
expression of the pho regulon, the specific geometry of 
the transcription initiation complex remains undeter- 
mined. In recent years, a significant amount of work has 
been dedicated to solving structures of RNAP complexes 
(see review articles [13-15]). The bacterial RNA poly- 
merase (RNAP) is a multi-molecular complex consisting 
of five subunits including: two a-subunits, a P-subunit, 
a P'-subunit and an co-subunit. To start transcription, 
the RNAP has to first bind a a-subunit. This RNAP/a- 
subunit complex then recognizes and binds to a targeted 
DNA operator site to go through the transcription pro- 
cess. In 2002, the low-resolution (6.5 A) structure of the 
Thermus aquaticus RNAP holoenzyme with a fork-junc- 
tion promoter DNA complex (PDB accession code: 
1L9Z) was solved [16]. Since then, crystal structures of 
different RNAP holoenzymes were solved to a higher 
resolution [17,18] (e.g., PDB accession codes: 1ZYR, 
2A6E). More recently, an electron microscopy (EM)- 
derived structure of a Catabolite Gene Activator (CAP)- 
dependent transcription initiation complex has been 
derived [19] (PDB accession code 3IYD). The structural 
information available so far provides a knowledge base 
for modeling of the transcription initiation complex 
together with the response regulator PhoB. In particular, 
the structure of the Catabolite Gene Activator (CAP)- 
dependent transcription initiation complex (3IYD) pro- 
vides an ideal template for modeling structure of the 
PhoB-dependent transcription initiation complex. 

Results and discussion 

We begin by considering the PhoB dimer as it interacts 
with DNA, for which no complete structure exists. In 
the crystal structure of the PhoB ED dimer bound to 
pho box DNA (PDB accession code: 1GXP[7], shown as 
magenta and white molecules in Figure la), the binding 
of DNA direct repeats force the ED dimer to bind with 
a tandem symmetry. The known structure of the PhoB 
RD dimeric complex [8] (PDB accession code: 2JB9), 
however, follows a two-fold rotational symmetry. While 
it is possible to simply rotate one of the EDs relative to 
the RD to make a complex satisfying both structures, 
this procedure results in a tightly stretched linker, asym- 
metry between the two PhoBs, and fabricating an RD- 
ED interface from scratch. Alternatively, we examine the 
variety of response regulator structures that contain RD 
and ED together (PDB accession codes: 1KGS, 1P2F, 
1YS6, 2GWR, 20QR, 1A04, 1YIO). These structures 
contain the information of RD/ED MBGs and demon- 
strate that the two domains can interact with a variety 
of binding geometries. 



Combining the information of RD/ED MBGs with the 
structure of the ED/ED dimeric complex (1GXP), we 
explore the potential solutions for the PhoB dimeric 
complex. Out of the RD/ED conformations, only that of 
DrrB [20] (1P2F, shown as the red and the blue mole- 
cules in Figure 1), a PhoB/OmpR homolog, provides a 
satisfactory solution where the two RDs are in contact 
but not overlapping. Combining the structural informa- 
tion of ED/ED (1GXP), RD/ED (1P2F), ED (1GXP) and 
RD (2JB9), the model of the PhoB dimeric complex is 
developed (shown as the white and magenta molecules 
bound to DNA in Figure lb). This model structure has 
appealing features including: good stereochemistry (no 
clashes between domains, stable interface surface area), 
protein-like structure (contents of secondary structures, 
density, etc.) and several of the known MBGs. 

This PhoB in the modeled complex contains a pre- 
viously unseen interface between RDs, however, because 
of the tandem head-to-tail orientation - that is different 
from the two-fold symmetry observed in the PhoB RD/ 
RD dimer (2JB9). The next question is "does the new 
MBG between the two RDs in the model exists in other 
protein domains of a similar fold?" To answer this ques- 
tion, we search for interfaces between domains that 
have the flavodoxin-like fold and give the two domains 
with a tandem symmetry. Interestingly, the CheY (a che- 
motaxis protein) of the two CheY-P2 heterodimers in 
the crystal asymmetric unit [21] (PDB accession code: 
1FFG), has the two flavodoxin-like molecules following 
a tandem symmetry. This contact of the two CheYs 
(1FFG) in the crystal is very similar to that of the PhoB 
dimeric RDs as shown in Figure lc. While this particu- 
lar CheY dimeric arrangement may not be functionally 
relevant for the CheY-CheA interaction, it does provide 
a potential MBG for the interaction of flavodoxin-like 
molecules. 

We turn our attention to the transcription initiation 
complex. We choose to use the transcription initiation 
complex with DNA and the Catabolite Activator Protein 
bound to it (PDB ID: 3IYD) as a template for our 
model. The DNA duplex can serve as a structural link 
and allow the assembly of all the components into one 
functional unit. All the proteins in the complex either 
have a direct contact (i.e., a-subunit, a-subunit, PhoB) 
or contacts thru other molecules (i.e., P-subunit, P'-sub- 
unit, co-subunit) that can link to the DNA molecule. 
The DNA molecule that we select for this study is the 
E. coli K-12 PhoA promoter (400854 to 400950 bp) with 
both a-subunit and PhoB binding sites (information 
derived from RegulonDB [22]). To enable comparison, 
the sequences of the two promoters (CAP and PhoA) 
are shown in Figure 2a with the CAP promoter (as 
found in 3IYD) shown on the top and the PhoA promo- 
ter shown at the bottom. The protein binding sites on 
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1P2F 1f2F 




(b) 

Figure 1 Structural model of the PhoB dimeric complex binding to its targeted DNA duplex. Matching ED of DrrB to ED of PhoB is 
shown in 1a. The resultant PhoB/DNA model is shown in 1b. RD/RD Motif Binding Geometry (MGB) in CheY (PDB accession code: 1FFG) is 
similar to that in the modeled PhoB, and is shown in 1c. 



the two promoters are highlighted in boxes. The main 
difference between the two promoters is the relative 
binding locations for the two factors. The CAP binding 
sites are located upstream of the -35 site while the PhoB 
binding sites are overlapping with the -35 sites. There 
was a structural concern, whether the -35 and the two 
PhoB binding sites can be utilized simultaneously. 
When these binding sites are utilized simultaneously, a 
set of interactions between the RNAP and the two PhoB 
molecules can be predicted by our model. 

In additional to the difference in the binding sites, 
changes in the DNA from 3IYD will be required because 
the CAP dimer binds and bends the DNA promoter 



much more than does the PhoB dimer. Therefore, the 
promoter region of the DNA in the PhoB transcription 
initiation complex has to be remodeled from the tem- 
plate structure (3IYD). Using a "motif modeling 
approach" as described in our earlier work [23], the 
structure of the DNA upstream to this overlapping 
region (including the PhoB binding sites) can be mod- 
eled using the structure of DNA from the PhoB ED/ 
DNA complex (from 1GXP). This promoter DNA is 
extended upstream with a piece of canonical DNA 
duplex to accommodate the a-subunit C-terminal 
domain (CTD) binding. As a comparison, we have mod- 
eled the same piece of DNA upstream to this 
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3IYD: 

-70 



-60 



-50 



-40 



-30 



-20 



CGCAAT p^ATGTGATCTAGATCACATTT| TAGG( |AAAAAj| GGC [rTTACAp TTTATGCTTCC 

CAP a-CTD -35 element 



-10 



10 



20 



GGCTCG TATAAT :GCACCTTATGTGAGCGGATAACAAG 



-10 element 



Coli K-12 (400854 to 400950): 



-70 



-60 



-50 



-40 



AGCATCCTCGTCAGTAAAAAGTTAATCTTTTCAACAGCTGT 



|^ACAGC' rGTCp^.Ti 7 



PhoB binding sites 

1 -30 \ 



-20 



jlAAQTTGTCAC GGCC 
J 



-10 



10 



a-CTD 

20 



-35 element 



AG AC! TATAGT1CGCTTTGTTTTTATTTTTTAATGTAT 



■10 element 




Figure 2 The sequences of the E. coli CAP-dependent and PhoB promoters with the corresponding protein binding sites indicated are 
shown in 2a. 2b shows that CAP and PhoB bind and bend the DNA to a different degree than the canonical DNA. 



overlapping region using only a piece of canonical DNA 
B-duplex. The template DNA (from 3IYD), the remo- 
deled promoter DNA for PhoB transcription initiation 
complex, and the upstream DNA in a canonical B- 



duplex conformation are shown in Figure 2b in white, 
magenta, and cyan respectively. 

After the structure of the promoter DNA duplex is re- 
modeled, the corresponding proteins can be assembled 
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back into the PhoB transcription initiation complex 
using the information of their MBGs with their targeted 
sites on the DNA (Additional file 1). With the remodel- 
ing of the promoter DNA, the positions and orientations 
of a-CTD and a-CTD are different from those in the 
template structure. The connecting loops between the 
N-terminal domain (NTD) and CTD of the a- and g- 
subunits also needed to be changed accordingly [24]. 
The resultant structure (shown in Figure 3a) has the 
subunits interacting but not overlapping with each 
other, a necessary condition for complex structural 
modeling. According to the model, a-CTD, a-CTD as 
well as a segment (residues 839 to 917) of P-subunit are 
in direct contact with the two PhoB molecules in the 
complex. To improve the stereochemistry between the 
interacting subunits, the remodeled portions of the com- 
plex, including the DNA promoter, the PhoB dimer, the 
a-CTD, the a-CTD and residues 839 to 917 of p-subu- 
nit were subjected to a refinement procedure using 
AMBER [25]. 

The energy-refined structure of this portion of PhoB 
transcription initiation complex is shown in Figure 3b 
and a coordinate file is available as supplementary mate- 
rial. The clearest self-consistency check from our model 
is that the overlapping binding sites covering the -35 
region allow the simultaneous binding of the PhoB 
dimer and the a-CTD without violating the volume 
exclusions for all the molecules involved in the binding. 
Both a-CTD and a-CTD interact directly with one of 
the PhoB molecules (shown in red in Figure 3) that 
binds to the site upstream of the -35 region. For a more 
detailed check on the validity of our model, we note 
that the residues at the interface between these mole- 
cules include: R-586, Q-589, 1-590, A-592, K-593 from 
the a-CTD, D-258, V-264, A-267, N-268 from the a- 
CTD and W-184, G-185, V-190, E-191, D-192 from the 



PhoB (as highlighted in Figure 4). This result is consis- 
tent with the four PhoB residues (W-184, G-185, V-190 
and D-192) identified to be involved in the polymerase 
binding based on mutation study [26]. The residues on 
the two PhoB molecules that interact directly with a- 
CTD, p-subunit and a-CTD are annotated in Figure 4b. 
Our results indicate that both the RD and ED domains 
of the two PhoB molecules in the dimer are interacting 
with the RNAP/a-subunit of the transcription initiation 
complex. The Solvent Excluded Surface Areas for PhoB- 
a/a-subunit, PhoB-a/a-subunit and PhoB-b/p-subunit 
are 2,867 A 2 , 1,098 A 2 and 2,165 A 2 respectively. These 
values are consistent with those (639 A 2 to 3,228 A 2 ) 
[27] observed in the heterocomplexes from PDB. 

There exist off-the-shelf software that allows dockings 
of proteins or protein domains into complexes/full pro- 
teins (e.g., ZDOCK [28], AutoDock [29], RosettaDock 
[30]). These programs apply different sampling 
approaches and scoring functions with various degrees 
of success (e.g., see CAPRI [31] assessments). These 
docking procedures seem to work at their best if the 
interaction between the components is strong and/or 
there exists a global binding minimum. As a quick com- 
parison, we have downloaded one of these programs, 
ZDOCK, and generated 2,000 structures (MBGs) dock- 
ing the two domains RD (2JB9, residues 3-123) and ED 
(1GXP, residues 127-229) for deriving the PhoB struc- 
ture. The two domains (RD & ED) of PhoB molecule 
are separated by a loop of 4-peptides group. There is a 
physical limitation for a 4-residues loop to make the 
connection. If the cut-off length for a 4-residues loop is 
set to be 14 Angstrom (approximately corresponds to a 
complete extended conformation), only 2.12% (43) of 
the 2,000 MBGs satisfied the connection criteria. If we 
focus on the set of the top 100 MBGs, structures 21 and 
96 are the two that allow the RD-ED connection. A 




Figure 3 The modeled structure of the PhoB dimer in the transcription initiation complex. The color-coding of different components in 

the complex is shown in 3a. The a,(3,(3',<x> subunits are drawn in magenta, the a-subunit that interacts with PhoB is drawn in cyan, the c-subunit 

is drawn in yellow, the PhoB-a is drawn is red and the PhoB-b is drawn in blue. Figure 3b shows the close-up of molecules (a-CTD, c-CTD and 

segment of p-subunit) interacting with the two PhoB molecules. 
I ) 
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Figure 4 Interactions between PhoB molecules and subunits (a-, p- and a-subunits) of the polymerase complex. In Figure 4a, PhoB-a is 
drawn in both ribbon and transparent surface plots while the a- and c-CTD are drawn in ribbon plots. The residues involved in binding are 
highlighted. In Figure 4b, the residues on the two PhoB molecules interacting with the subunits of the polymerase are highlighted with arrows. 



further look at the PhoB-PhoB dimer structures mod- 
eled based on the two ED-RD MBGs and the structure 
of the ED-ED-DNA complex (1GXP), neither structure 
is stereochemically feasible due to the domain 



overlapping including clashes between protein-protein 
and protein-DNA. If all the MBGs of the two domains 
from the docking study are compared to the MBG from 
our model, the closest came from structure 1,934 with a 



Tung and McMahon BMC Structural Biology 2012, 12:3 
http://www.biomedcentral.eom/1 472-6807/1 2/3 



Page 7 of 8 



RMSD of 4.0 Angstrom (based on C a atoms only). 
Overall, the docking procedure is less than efficient 
(only -2% of the docked structures satisfies the connec- 
tivity constraint). It was also found that the selection of 
the relevant PhoB structure out of the pool of a large 
number of potential MBGs from the docking study is a 
non-trivial task. 

Conclusion 

We have demonstrated that Motif Binding Geometry 
(MBG) can be used to model structure of the PhoB 
dimer as it interacts with the transcription initiation 
complex (PhoB/RNAP/DNA) ofE. colL While the limited 
space available for the targeted protein in the molecular 
complex makes the modeling of the protein structure 
more challenging, it also provides a stringent test for 
choosing the relevant structure from the pool of potential 
conformations. While the two domains (ED and RD) of 
PhoB adopt a different symmetry when crystallized, it is 
not obvious how to assemble the PhoB dimer from the 
information of its domain structures. Using the excluded 
volume information and known MBGs between the ED 
and RD, we are able to develop a structural model for the 
PhoB dimeric complex where the two RD domains follow 
a tandem symmetry similar to that as seen in the two fla- 
vodoxin-like folds of CheY, a chemotaxis protein. The 
modeled PhoB dimer can bind to the direct repeat Pho 
box in the promoter region and interact directly with the 
a-, P- and a- subunits of the RNAP. 

Just as protein structures serve to integrate a variety of 
biochemical information and advance our understanding 
of the enzymatic reactions and molecular machines that 
enable life to continue, modeling of protein complexes 
will shed light on the protein interaction networks 
responsible for regulatory and signaling processes of 
cells. While our approach has not yet been tested with 
other protein complexes, it is hoped that the reader will 
see our methodology as a way of integrating the evolu- 
tionary, physical, and biological experimental data to 
produce new, testable, hypothesis. 

Methods 

Motif Binding Geometry (MBG) used for complex 
homology modeling 

Upon binding, the folds of proteins often remain 
unchanged while the specifics of the surface may be 
adjusted to accommodate the interactions. Therefore, 
while docking of molecules by matching surface shape is 
an attractive method in principle, significant errors can 
be introduced into the overall binding geometry if 
induced fitting at the interface is involved during the 
binding process. Here, we introduce a structural based 
concept for bio-molecular docking by matching the scaf- 
foldings (secondary structural motifs) of the interacting 



molecules to those with homologous folds and known 
MBGs. This approach is useful to structural modeling 
both to arrange stable folded domains in the intact pro- 
tein and to find geometries of individual molecules in 
the complex. The method can readily provide a manage- 
able set of potential solutions for further study and/or 
refinement. 

Motif structural matching 

Protein motifs consists of secondary structural elements 
(ct-helix and P -sheet) arranged with a specific geometry 
in space. In cases where sequence homology is low (e.g., 
< 20% identity), it is difficult to discern structural align- 
ments using only sequence alignments. A general 
approach based on the structural information is required 
for motif structural matching. We use the secondary 
structural elements to align the motifs. When each of 
the secondary structural elements is represented by a 
line vector, the structural matching can be accomplished 
by minimizing the angles (9) and the minimum dis- 
tances (d) between the set of corresponding line vectors. 
The Metropolis Monte Carlo simulation [32] is used for 
the minimization procedure. 

Graphics 

Molecular graphics images were produced using the 
UCSF Chimera package [33] from the Resource for Bio- 
computing, Visualization and Informatics at the Univer- 
sity of California, San Francisco. 

Additional material 



Additional file 1: PDB format coordinate file of the modeled 

complex.PDB format coordinate file of the modeled complex. 
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